Data visualization is the graphic representation of data. It involves producing images that communicate relationships among the represented data to viewers. Visualizing data is an essential part of data analysis and machine learning. We'll use Python libraries Matplotlib and Seaborn to learn and apply some popular data visualization techniques. We'll use the words chart, plot, and graph interchangeably in this tutorial.
To begin, let's install and import the libraries. We'll use the matplotlib.pyplot module for basic plots like line & bar charts. It is often imported with the alias plt. We'll use the seaborn module for more advanced plots. It is commonly imported with the alias sns.
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
#from importlib import reload
#plt=reload(plt)
Notice this we also include the special command %matplotlib inline to ensure that our plots are shown and embedded within the Jupyter notebook itself. Without this command, sometimes plots may show up in pop-up windows.
The line chart is one of the simplest and most widely used data visualization techniques. A line chart displays information as a series of data points or markers connected by straight lines. You can customize the shape, size, color, and other aesthetic elements of the lines and markers for better visual clarity.
Here's a Python list showing the yield of apples (tons per hectare) over six years in an imaginary country called Kanto.
yield_apples = [0.895, 0.91, 0.919, 0.926, 0.929, 0.931 ]
yield_apples
[0.895, 0.91, 0.919, 0.926, 0.929, 0.931]
We can visualize how the yield of apples changes over time using a line chart. To draw a line chart, we can use the plt.plot function.
plt.plot(yield_apples)
[<matplotlib.lines.Line2D at 0x7cf83f095a60>]
shown within the output. We can include a semicolon (;) at the end of the last statement in the cell to avoiding showing the output and display just the graph.
plt.plot(yield_apples);
The X-axis of the plot currently shows list element indexes 0 to 5. The plot would be more informative if we could display the year for which we're plotting the data. We can do this by two arguments plt.plot.
years = [2010, 2011, 2012, 2013, 2014, 2015]
yield_apples = [0.895, 0.91, 0.919, 0.926, 0.929, 0.931 ]
plt.plot(years, yield_apples);
Axis Labels We can add labels to the axes to show what each axis represents using the plt.xlabel and plt.ylabel methods.
plt.plot(years, yield_apples)
plt.xlabel('Year')
plt.ylabel('Yield (tons per hectare)');
You can invoke the plt.plot function once for each line to plot multiple lines in the same graph. Let's compare the yields of apples vs. oranges in Kanto.
years = range(2000, 2012)
apples = [0.895, 0.91, 0.919, 0.926, 0.929, 0.931, 0.934, 0.936, 0.937, 0.9375, 0.9372, 0.939]
oranges = [0.962, 0.941, 0.930, 0.923, 0.918, 0.908, 0.907, 0.904, 0.901, 0.898, 0.9, 0.896 ]
plt.plot(years, apples)
plt.plot(years, oranges)
plt.xlabel('Year')
plt.ylabel('Yield (tons per hectare)');
To differentiate between multiple lines, we can include a legend within the graph using the plt.legend function. We can also set a title for the chart using the plt.title function.
plt.plot(years, apples)
plt.plot(years, oranges)
plt.xlabel('Year')
plt.ylabel('Yield (tons per hectares)')
plt.title('Crop Yields in Kanto')
plt.legend(['Apples', 'Oranges']);
We can also show markers for the data points on each line using the marker argument of plt.plot. Matplotlib provides many different markers, like a circle, cross, square, diamond, etc.
plt.plot(years, apples, marker='o')
plt.plot(years, oranges, marker='x')
plt.xlabel('Year')
plt.ylabel('Yield (tons per hectare)')
plt.title('Crop Yield in Kanto')
plt.legend(['Apples', 'Oranges']);
plt.plot(years, apples, marker='^')
plt.plot(years, oranges, marker='v')
plt.xlabel('Year')
plt.ylabel('Yield (tons per hectare)')
plt.title('Crop Yield in Kanto')
plt.legend(['Apples', 'Oranges']);
plt.plot(years, apples, marker='h')
plt.plot(years, oranges, marker='d')
plt.xlabel('Year')
plt.ylabel('Yield (tons per hectare)')
plt.title('Crop Yield in Kanto')
plt.legend(['Apples', 'Oranges']);
The plt.plot function supports many arguments for styling lines and markers:
color or c: Set the color of the line (supported colors) linestyle or ls: Choose between a solid or dashed line linewidth or lw: Set the width of a line markersize or ms: Set the size of markers markeredgecolor or mec: Set the edge color for markers markeredgewidth or mew: Set the edge width for markers markerfacecolor or mfc: Set the fill color for markers alpha: Opacity of the plot
plt.plot(years, apples, marker='s', c='b', ls='-', lw=2, ms=8, mew=2, mec='navy')
plt.plot(years, oranges, marker='o', c='r', ls='--', lw=3, ms=10, alpha=.5)
plt.xlabel('Year')
plt.ylabel('Yield (tons per hectare)')
plt.title('Crop Yield in Kanto')
plt.legend(['Apples', 'Oranges']);
The fmt argument provides a shorthand for specifying the marker shape, line style, and line color. It can be provided as the third argument to plt.plot.
plt.plot(years, apples, 's-b')
plt.plot(years, oranges, 'o--r')
plt.xlabel('Year')
plt.ylabel('Yield (tons per hectare)')
plt.title('Crop Yield in Kanto')
plt.legend(['Apples', 'Oranges']);
plt.plot(years, apples, '8-g')
plt.plot(years, oranges, 'd--y')
plt.xlabel('Year')
plt.ylabel('Yield (tons per hectare)')
plt.title('Crop Yield in Kanto')
plt.legend(['Apples', 'Oranges']);
plt.plot(years, apples, '1-c')
plt.plot(years, oranges, '2--m')
plt.xlabel('Year')
plt.ylabel('Yield (tons per hectare)')
plt.title('Crop Yield in Kanto')
plt.legend(['Apples', 'Oranges']);
If you don't specify a line style in fmt, only markers are drawn.
plt.plot(years, apples, 'or')
plt.title("Yield of Apples (tons per hectare)");
plt.plot(years, oranges, 'or')
plt.title("Yield of Oranges (tons per hectare)");
You can use the plt.figure function to change the size of the figure.
plt.figure(figsize=(2,2))
plt.plot(years, oranges, 'or')
plt.title("Yield of Oranges (tons per hectare)");
plt.figure(figsize=(8,4))
plt.plot(years, oranges, 'or')
plt.title("Yield of Oranges (tons per hectare)");
plt.figure(figsize=(12,6))
plt.plot(years, oranges, 'or')
plt.title("Yield of Oranges (tons per hectare)");
An easy way to make your charts look beautiful is to use some default styles from the Seaborn library. These can be applied globally using the sns.set_style function. You can see a full list of predefined styles here:
sns.set_style('whitegrid')
plt.plot(years, apples, 's-b')
plt.plot(years, oranges, 'o--r')
plt.xlabel('Year')
plt.ylabel('Yield (tons per hectare)')
plt.title('Crop Yield in Kanto')
plt.legend(['Apples', 'Oranges']);
plt.plot(years, apples, '8-g')
plt.plot(years, oranges, 'd--y')
plt.xlabel('Year')
plt.ylabel('Yield (tons per hectare)')
plt.title('Crop Yield in Kanto')
plt.legend(['Apples', 'Oranges']);
plt.plot(years, apples, '1-c')
plt.plot(years, oranges, '2--m')
plt.xlabel('Year')
plt.ylabel('Yield (tons per hectare)')
plt.title('Crop Yield in Kanto')
plt.legend(['Apples', 'Oranges']);
sns.set_style('darkgrid')
plt.plot(years, apples, 's-b')
plt.plot(years, oranges, 'o--r')
plt.xlabel('Year')
plt.ylabel('Yield (tons per hectare)')
plt.title('Crop Yield in Kanto')
plt.legend(['Apples', 'Oranges']);
plt.plot(years, apples, '8-g')
plt.plot(years, oranges, 'd--y')
plt.xlabel('Year')
plt.ylabel('Yield (tons per hectare)')
plt.title('Crop Yield in Kanto')
plt.legend(['Apples', 'Oranges']);
plt.plot(years, apples, '1-c')
plt.plot(years, oranges, '2--m')
plt.xlabel('Year')
plt.ylabel('Yield (tons per hectare)')
plt.title('Crop Yield in Kanto')
plt.legend(['Apples', 'Oranges']);
import matplotlib
matplotlib.rcParams
RcParams({'_internal.classic_mode': False,
'agg.path.chunksize': 0,
'animation.bitrate': -1,
'animation.codec': 'h264',
'animation.convert_args': ['-layers', 'OptimizePlus'],
'animation.convert_path': 'convert',
'animation.embed_limit': 20.0,
'animation.ffmpeg_args': [],
'animation.ffmpeg_path': 'ffmpeg',
'animation.frame_format': 'png',
'animation.html': 'none',
'animation.writer': 'ffmpeg',
'axes.autolimit_mode': 'data',
'axes.axisbelow': True,
'axes.edgecolor': 'white',
'axes.facecolor': '#EAEAF2',
'axes.formatter.limits': [-5, 6],
'axes.formatter.min_exponent': 0,
'axes.formatter.offset_threshold': 4,
'axes.formatter.use_locale': False,
'axes.formatter.use_mathtext': False,
'axes.formatter.useoffset': True,
'axes.grid': True,
'axes.grid.axis': 'both',
'axes.grid.which': 'major',
'axes.labelcolor': '.15',
'axes.labelpad': 4.0,
'axes.labelsize': 'medium',
'axes.labelweight': 'normal',
'axes.linewidth': 0.8,
'axes.prop_cycle': cycler('color', ['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf']),
'axes.spines.bottom': True,
'axes.spines.left': True,
'axes.spines.right': True,
'axes.spines.top': True,
'axes.titlecolor': 'auto',
'axes.titlelocation': 'center',
'axes.titlepad': 6.0,
'axes.titlesize': 'large',
'axes.titleweight': 'normal',
'axes.titley': None,
'axes.unicode_minus': True,
'axes.xmargin': 0.05,
'axes.ymargin': 0.05,
'axes.zmargin': 0.05,
'axes3d.grid': True,
'axes3d.xaxis.panecolor': (0.95, 0.95, 0.95, 0.5),
'axes3d.yaxis.panecolor': (0.9, 0.9, 0.9, 0.5),
'axes3d.zaxis.panecolor': (0.925, 0.925, 0.925, 0.5),
'backend': 'module://matplotlib_inline.backend_inline',
'backend_fallback': True,
'boxplot.bootstrap': None,
'boxplot.boxprops.color': 'black',
'boxplot.boxprops.linestyle': '-',
'boxplot.boxprops.linewidth': 1.0,
'boxplot.capprops.color': 'black',
'boxplot.capprops.linestyle': '-',
'boxplot.capprops.linewidth': 1.0,
'boxplot.flierprops.color': 'black',
'boxplot.flierprops.linestyle': 'none',
'boxplot.flierprops.linewidth': 1.0,
'boxplot.flierprops.marker': 'o',
'boxplot.flierprops.markeredgecolor': 'black',
'boxplot.flierprops.markeredgewidth': 1.0,
'boxplot.flierprops.markerfacecolor': 'none',
'boxplot.flierprops.markersize': 6.0,
'boxplot.meanline': False,
'boxplot.meanprops.color': 'C2',
'boxplot.meanprops.linestyle': '--',
'boxplot.meanprops.linewidth': 1.0,
'boxplot.meanprops.marker': '^',
'boxplot.meanprops.markeredgecolor': 'C2',
'boxplot.meanprops.markerfacecolor': 'C2',
'boxplot.meanprops.markersize': 6.0,
'boxplot.medianprops.color': 'C1',
'boxplot.medianprops.linestyle': '-',
'boxplot.medianprops.linewidth': 1.0,
'boxplot.notch': False,
'boxplot.patchartist': False,
'boxplot.showbox': True,
'boxplot.showcaps': True,
'boxplot.showfliers': True,
'boxplot.showmeans': False,
'boxplot.vertical': True,
'boxplot.whiskerprops.color': 'black',
'boxplot.whiskerprops.linestyle': '-',
'boxplot.whiskerprops.linewidth': 1.0,
'boxplot.whiskers': 1.5,
'contour.algorithm': 'mpl2014',
'contour.corner_mask': True,
'contour.linewidth': None,
'contour.negative_linestyle': 'dashed',
'date.autoformatter.day': '%Y-%m-%d',
'date.autoformatter.hour': '%m-%d %H',
'date.autoformatter.microsecond': '%M:%S.%f',
'date.autoformatter.minute': '%d %H:%M',
'date.autoformatter.month': '%Y-%m',
'date.autoformatter.second': '%H:%M:%S',
'date.autoformatter.year': '%Y',
'date.converter': 'auto',
'date.epoch': '1970-01-01T00:00:00',
'date.interval_multiples': True,
'docstring.hardcopy': False,
'errorbar.capsize': 0.0,
'figure.autolayout': False,
'figure.constrained_layout.h_pad': 0.04167,
'figure.constrained_layout.hspace': 0.02,
'figure.constrained_layout.use': False,
'figure.constrained_layout.w_pad': 0.04167,
'figure.constrained_layout.wspace': 0.02,
'figure.dpi': 100.0,
'figure.edgecolor': 'white',
'figure.facecolor': 'white',
'figure.figsize': [6.4, 4.8],
'figure.frameon': True,
'figure.hooks': [],
'figure.labelsize': 'large',
'figure.labelweight': 'normal',
'figure.max_open_warning': 20,
'figure.raise_window': True,
'figure.subplot.bottom': 0.11,
'figure.subplot.hspace': 0.2,
'figure.subplot.left': 0.125,
'figure.subplot.right': 0.9,
'figure.subplot.top': 0.88,
'figure.subplot.wspace': 0.2,
'figure.titlesize': 'large',
'figure.titleweight': 'normal',
'font.cursive': ['Apple Chancery',
'Textile',
'Zapf Chancery',
'Sand',
'Script MT',
'Felipa',
'Comic Neue',
'Comic Sans MS',
'cursive'],
'font.family': ['sans-serif'],
'font.fantasy': ['Chicago',
'Charcoal',
'Impact',
'Western',
'Humor Sans',
'xkcd',
'fantasy'],
'font.monospace': ['DejaVu Sans Mono',
'Bitstream Vera Sans Mono',
'Computer Modern Typewriter',
'Andale Mono',
'Nimbus Mono L',
'Courier New',
'Courier',
'Fixed',
'Terminal',
'monospace'],
'font.sans-serif': ['Arial',
'DejaVu Sans',
'Liberation Sans',
'Bitstream Vera Sans',
'sans-serif'],
'font.serif': ['DejaVu Serif',
'Bitstream Vera Serif',
'Computer Modern Roman',
'New Century Schoolbook',
'Century Schoolbook L',
'Utopia',
'ITC Bookman',
'Bookman',
'Nimbus Roman No9 L',
'Times New Roman',
'Times',
'Palatino',
'Charter',
'serif'],
'font.size': 10.0,
'font.stretch': 'normal',
'font.style': 'normal',
'font.variant': 'normal',
'font.weight': 'normal',
'grid.alpha': 1.0,
'grid.color': 'white',
'grid.linestyle': '-',
'grid.linewidth': 0.8,
'hatch.color': 'black',
'hatch.linewidth': 1.0,
'hist.bins': 10,
'image.aspect': 'equal',
'image.cmap': 'rocket',
'image.composite_image': True,
'image.interpolation': 'antialiased',
'image.lut': 256,
'image.origin': 'upper',
'image.resample': True,
'interactive': True,
'keymap.back': ['left', 'c', 'backspace', 'MouseButton.BACK'],
'keymap.copy': ['ctrl+c', 'cmd+c'],
'keymap.forward': ['right', 'v', 'MouseButton.FORWARD'],
'keymap.fullscreen': ['f', 'ctrl+f'],
'keymap.grid': ['g'],
'keymap.grid_minor': ['G'],
'keymap.help': ['f1'],
'keymap.home': ['h', 'r', 'home'],
'keymap.pan': ['p'],
'keymap.quit': ['ctrl+w', 'cmd+w', 'q'],
'keymap.quit_all': [],
'keymap.save': ['s', 'ctrl+s'],
'keymap.xscale': ['k', 'L'],
'keymap.yscale': ['l'],
'keymap.zoom': ['o'],
'legend.borderaxespad': 0.5,
'legend.borderpad': 0.4,
'legend.columnspacing': 2.0,
'legend.edgecolor': '0.8',
'legend.facecolor': 'inherit',
'legend.fancybox': True,
'legend.fontsize': 'medium',
'legend.framealpha': 0.8,
'legend.frameon': True,
'legend.handleheight': 0.7,
'legend.handlelength': 2.0,
'legend.handletextpad': 0.8,
'legend.labelcolor': 'None',
'legend.labelspacing': 0.5,
'legend.loc': 'best',
'legend.markerscale': 1.0,
'legend.numpoints': 1,
'legend.scatterpoints': 1,
'legend.shadow': False,
'legend.title_fontsize': None,
'lines.antialiased': True,
'lines.color': 'C0',
'lines.dash_capstyle': <CapStyle.butt: 'butt'>,
'lines.dash_joinstyle': <JoinStyle.round: 'round'>,
'lines.dashdot_pattern': [6.4, 1.6, 1.0, 1.6],
'lines.dashed_pattern': [3.7, 1.6],
'lines.dotted_pattern': [1.0, 1.65],
'lines.linestyle': '-',
'lines.linewidth': 1.5,
'lines.marker': 'None',
'lines.markeredgecolor': 'auto',
'lines.markeredgewidth': 1.0,
'lines.markerfacecolor': 'auto',
'lines.markersize': 6.0,
'lines.scale_dashes': True,
'lines.solid_capstyle': <CapStyle.round: 'round'>,
'lines.solid_joinstyle': <JoinStyle.round: 'round'>,
'markers.fillstyle': 'full',
'mathtext.bf': 'sans:bold',
'mathtext.cal': 'cursive',
'mathtext.default': 'it',
'mathtext.fallback': 'cm',
'mathtext.fontset': 'dejavusans',
'mathtext.it': 'sans:italic',
'mathtext.rm': 'sans',
'mathtext.sf': 'sans',
'mathtext.tt': 'monospace',
'patch.antialiased': True,
'patch.edgecolor': 'w',
'patch.facecolor': 'C0',
'patch.force_edgecolor': True,
'patch.linewidth': 1.0,
'path.effects': [],
'path.simplify': True,
'path.simplify_threshold': 0.111111111111,
'path.sketch': None,
'path.snap': True,
'pcolor.shading': 'auto',
'pcolormesh.snap': True,
'pdf.compression': 6,
'pdf.fonttype': 3,
'pdf.inheritcolor': False,
'pdf.use14corefonts': False,
'pgf.preamble': '',
'pgf.rcfonts': True,
'pgf.texsystem': 'xelatex',
'polaraxes.grid': True,
'ps.distiller.res': 6000,
'ps.fonttype': 3,
'ps.papersize': 'letter',
'ps.useafm': False,
'ps.usedistiller': None,
'savefig.bbox': None,
'savefig.directory': '~',
'savefig.dpi': 'figure',
'savefig.edgecolor': 'auto',
'savefig.facecolor': 'auto',
'savefig.format': 'png',
'savefig.orientation': 'portrait',
'savefig.pad_inches': 0.1,
'savefig.transparent': False,
'scatter.edgecolors': 'face',
'scatter.marker': 'o',
'svg.fonttype': 'path',
'svg.hashsalt': None,
'svg.image_inline': True,
'text.antialiased': True,
'text.color': '.15',
'text.hinting': 'force_autohint',
'text.hinting_factor': 8,
'text.kerning_factor': 0,
'text.latex.preamble': '',
'text.parse_math': True,
'text.usetex': False,
'timezone': 'UTC',
'tk.window_focus': False,
'toolbar': 'toolbar2',
'webagg.address': '127.0.0.1',
'webagg.open_in_browser': True,
'webagg.port': 8988,
'webagg.port_retries': 50,
'xaxis.labellocation': 'center',
'xtick.alignment': 'center',
'xtick.bottom': False,
'xtick.color': '.15',
'xtick.direction': 'out',
'xtick.labelbottom': True,
'xtick.labelcolor': 'inherit',
'xtick.labelsize': 'medium',
'xtick.labeltop': False,
'xtick.major.bottom': True,
'xtick.major.pad': 3.5,
'xtick.major.size': 3.5,
'xtick.major.top': True,
'xtick.major.width': 0.8,
'xtick.minor.bottom': True,
'xtick.minor.pad': 3.4,
'xtick.minor.size': 2.0,
'xtick.minor.top': True,
'xtick.minor.visible': False,
'xtick.minor.width': 0.6,
'xtick.top': False,
'yaxis.labellocation': 'center',
'ytick.alignment': 'center_baseline',
'ytick.color': '.15',
'ytick.direction': 'out',
'ytick.labelcolor': 'inherit',
'ytick.labelleft': True,
'ytick.labelright': False,
'ytick.labelsize': 'medium',
'ytick.left': False,
'ytick.major.left': True,
'ytick.major.pad': 3.5,
'ytick.major.right': True,
'ytick.major.size': 3.5,
'ytick.major.width': 0.8,
'ytick.minor.left': True,
'ytick.minor.pad': 3.4,
'ytick.minor.right': True,
'ytick.minor.size': 2.0,
'ytick.minor.visible': False,
'ytick.minor.width': 0.6,
'ytick.right': False})
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (9,5)
matplotlib.rcParams['figure.facecolor'] = '#00000000'
plt.plot(years, oranges, 'or')
plt.title("Yield of Oranges (tons per hectare)");
plt.plot(years, apples, 's-b')
plt.plot(years, oranges, 'o--r')
plt.xlabel('Year')
plt.ylabel('Yield (tons per hectare)')
plt.title('Crop Yield in Kanto')
plt.legend(['Apples', 'Oranges']);
plt.plot(years, apples, '8-g')
plt.plot(years, oranges, 'd--y')
plt.xlabel('Year')
plt.ylabel('Yield (tons per hectare)')
plt.title('Crop Yield in Kanto')
plt.legend(['Apples', 'Oranges']);
plt.plot(years, apples, '1-c')
plt.plot(years, oranges, '2--m')
plt.xlabel('Year')
plt.ylabel('Yield (tons per hectare)')
plt.title('Crop Yield in Kanto')
plt.legend(['Apples', 'Oranges']);
In a scatter plot, the values of 2 variables are plotted as points on a 2-dimensional grid. Additionally, you can also use a third variable to determine the size or color of the points. Let's try out an example.
The Iris flower dataset provides sample measurements of sepals and petals for three species of flowers. The Iris dataset is included with the Seaborn library and can be loaded as a Pandas data frame.
flowers_df = sns.load_dataset('iris')
flowers_df
| sepal_length | sepal_width | petal_length | petal_width | species | |
|---|---|---|---|---|---|
| 0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| ... | ... | ... | ... | ... | ... |
| 145 | 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 146 | 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 147 | 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 148 | 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 149 | 5.9 | 3.0 | 5.1 | 1.8 | virginica |
150 rows × 5 columns
flowers_df.species.unique()
array(['setosa', 'versicolor', 'virginica'], dtype=object)
plt.plot(flowers_df.sepal_length, flowers_df.sepal_width);
The output is not very informative as there are too many combinations of the two properties within the dataset. There doesn't seem to be simple relationship between them.
We can use a scatter plot to visualize how sepal length & sepal width vary using the scatterplot function from the seaborn module (imported as sns).
sns.scatterplot(x=flowers_df.sepal_length, y=flowers_df.sepal_width );
sns.scatterplot(x=flowers_df.petal_length, y=flowers_df.petal_width );
Notice how the points in the above plot seem to form distinct clusters with some outliers. We can color the dots using the flower species as a hue. We can also make the points larger using the s argument.
sns.scatterplot(x=flowers_df.sepal_length,
y=flowers_df.sepal_width,
hue=flowers_df.species,
s=100);
Adding hues makes the plot more informative. We can immediately tell that Setosa flowers have a smaller sepal length but higher sepal widths. In contrast, the opposite is true for Virginica flowers.
Since Seaborn uses Matplotlib's plotting functions internally, we can use functions like plt.figure and plt.title to modify the figure.
plt.figure(figsize=(12,6))
plt.title('Sepal Dimension')
sns.scatterplot(x=flowers_df.sepal_length,
y=flowers_df.sepal_width,
hue=flowers_df.species,
s=100);
plt.figure(figsize=(12,6))
plt.title('Petal Dimension')
sns.scatterplot(x=flowers_df.petal_length,
y=flowers_df.petal_width,
hue=flowers_df.species,
s=100);
A histogram represents the distribution of a variable by creating bins (interval) along the range of values and showing vertical bars to indicate the number of observations in each bin.
For example, let's visualize the distribution of values of sepal width in the flowers dataset. We can use the plt.hist function to create a histogram.
Load data into a Pandas dataframe
flowers_df = sns.load_dataset("iris")
flowers_df.sepal_width
0 3.5
1 3.0
2 3.2
3 3.1
4 3.6
...
145 3.0
146 2.5
147 3.0
148 3.4
149 3.0
Name: sepal_width, Length: 150, dtype: float64
flowers_df.describe()
| sepal_length | sepal_width | petal_length | petal_width | |
|---|---|---|---|---|
| count | 150.000000 | 150.000000 | 150.000000 | 150.000000 |
| mean | 5.843333 | 3.057333 | 3.758000 | 1.199333 |
| std | 0.828066 | 0.435866 | 1.765298 | 0.762238 |
| min | 4.300000 | 2.000000 | 1.000000 | 0.100000 |
| 25% | 5.100000 | 2.800000 | 1.600000 | 0.300000 |
| 50% | 5.800000 | 3.000000 | 4.350000 | 1.300000 |
| 75% | 6.400000 | 3.300000 | 5.100000 | 1.800000 |
| max | 7.900000 | 4.400000 | 6.900000 | 2.500000 |
plt.title('Distribution of Sepal Width')
plt.hist(flowers_df.sepal_width);
plt.title('Distribution of Sepal Length')
plt.hist(flowers_df.sepal_length);
plt.title('Distribution of Petal Width')
plt.hist(flowers_df.petal_width);
plt.title('Distribution of Petal Length')
plt.hist(flowers_df.petal_length);
We can immediately see that the sepal widths lie in the range 2.0 - 4.5, and around 35 values are in the range 2.9 - 3.1, which seems to be the most populous bin.
We can control the number of bins or the size of each one using the bins argument.
Specifying the number of bins
plt.hist(flowers_df.sepal_width, bins=5);
import numpy as np
np.arange(2, 5, 0.25)
array([2. , 2.25, 2.5 , 2.75, 3. , 3.25, 3.5 , 3.75, 4. , 4.25, 4.5 ,
4.75])
Specifying the boundaries of each bin
plt.hist(flowers_df.sepal_width, bins=np.arange(2, 5, 0.25));
dat_bin_arange = np.arange(2,5,0.25)
plt.hist(flowers_df.sepal_width, bins=dat_bin_arange);
plt.hist(flowers_df.sepal_length, bins=dat_bin_arange);
Bins of unequal sizes
plt.hist(flowers_df.sepal_width, bins=[1,3,4,4.5]);
Similar to line charts, we can draw multiple histograms in a single chart. We can reduce each histogram's opacity so that one histogram's bars don't hide the others'.
Let's draw separate histograms for each species of flowers.
setosa_df = flowers_df[flowers_df.species == 'setosa']
versicolor_df = flowers_df[flowers_df.species == 'versicolor']
virginica_df = flowers_df[flowers_df.species == 'virginica']
setosa_df
| sepal_length | sepal_width | petal_length | petal_width | species | |
|---|---|---|---|---|---|
| 0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
| 1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
| 2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
| 3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
| 4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
| 5 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
| 6 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
| 7 | 5.0 | 3.4 | 1.5 | 0.2 | setosa |
| 8 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
| 9 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
| 10 | 5.4 | 3.7 | 1.5 | 0.2 | setosa |
| 11 | 4.8 | 3.4 | 1.6 | 0.2 | setosa |
| 12 | 4.8 | 3.0 | 1.4 | 0.1 | setosa |
| 13 | 4.3 | 3.0 | 1.1 | 0.1 | setosa |
| 14 | 5.8 | 4.0 | 1.2 | 0.2 | setosa |
| 15 | 5.7 | 4.4 | 1.5 | 0.4 | setosa |
| 16 | 5.4 | 3.9 | 1.3 | 0.4 | setosa |
| 17 | 5.1 | 3.5 | 1.4 | 0.3 | setosa |
| 18 | 5.7 | 3.8 | 1.7 | 0.3 | setosa |
| 19 | 5.1 | 3.8 | 1.5 | 0.3 | setosa |
| 20 | 5.4 | 3.4 | 1.7 | 0.2 | setosa |
| 21 | 5.1 | 3.7 | 1.5 | 0.4 | setosa |
| 22 | 4.6 | 3.6 | 1.0 | 0.2 | setosa |
| 23 | 5.1 | 3.3 | 1.7 | 0.5 | setosa |
| 24 | 4.8 | 3.4 | 1.9 | 0.2 | setosa |
| 25 | 5.0 | 3.0 | 1.6 | 0.2 | setosa |
| 26 | 5.0 | 3.4 | 1.6 | 0.4 | setosa |
| 27 | 5.2 | 3.5 | 1.5 | 0.2 | setosa |
| 28 | 5.2 | 3.4 | 1.4 | 0.2 | setosa |
| 29 | 4.7 | 3.2 | 1.6 | 0.2 | setosa |
| 30 | 4.8 | 3.1 | 1.6 | 0.2 | setosa |
| 31 | 5.4 | 3.4 | 1.5 | 0.4 | setosa |
| 32 | 5.2 | 4.1 | 1.5 | 0.1 | setosa |
| 33 | 5.5 | 4.2 | 1.4 | 0.2 | setosa |
| 34 | 4.9 | 3.1 | 1.5 | 0.2 | setosa |
| 35 | 5.0 | 3.2 | 1.2 | 0.2 | setosa |
| 36 | 5.5 | 3.5 | 1.3 | 0.2 | setosa |
| 37 | 4.9 | 3.6 | 1.4 | 0.1 | setosa |
| 38 | 4.4 | 3.0 | 1.3 | 0.2 | setosa |
| 39 | 5.1 | 3.4 | 1.5 | 0.2 | setosa |
| 40 | 5.0 | 3.5 | 1.3 | 0.3 | setosa |
| 41 | 4.5 | 2.3 | 1.3 | 0.3 | setosa |
| 42 | 4.4 | 3.2 | 1.3 | 0.2 | setosa |
| 43 | 5.0 | 3.5 | 1.6 | 0.6 | setosa |
| 44 | 5.1 | 3.8 | 1.9 | 0.4 | setosa |
| 45 | 4.8 | 3.0 | 1.4 | 0.3 | setosa |
| 46 | 5.1 | 3.8 | 1.6 | 0.2 | setosa |
| 47 | 4.6 | 3.2 | 1.4 | 0.2 | setosa |
| 48 | 5.3 | 3.7 | 1.5 | 0.2 | setosa |
| 49 | 5.0 | 3.3 | 1.4 | 0.2 | setosa |
versicolor_df
| sepal_length | sepal_width | petal_length | petal_width | species | |
|---|---|---|---|---|---|
| 50 | 7.0 | 3.2 | 4.7 | 1.4 | versicolor |
| 51 | 6.4 | 3.2 | 4.5 | 1.5 | versicolor |
| 52 | 6.9 | 3.1 | 4.9 | 1.5 | versicolor |
| 53 | 5.5 | 2.3 | 4.0 | 1.3 | versicolor |
| 54 | 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
| 55 | 5.7 | 2.8 | 4.5 | 1.3 | versicolor |
| 56 | 6.3 | 3.3 | 4.7 | 1.6 | versicolor |
| 57 | 4.9 | 2.4 | 3.3 | 1.0 | versicolor |
| 58 | 6.6 | 2.9 | 4.6 | 1.3 | versicolor |
| 59 | 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
| 60 | 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
| 61 | 5.9 | 3.0 | 4.2 | 1.5 | versicolor |
| 62 | 6.0 | 2.2 | 4.0 | 1.0 | versicolor |
| 63 | 6.1 | 2.9 | 4.7 | 1.4 | versicolor |
| 64 | 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
| 65 | 6.7 | 3.1 | 4.4 | 1.4 | versicolor |
| 66 | 5.6 | 3.0 | 4.5 | 1.5 | versicolor |
| 67 | 5.8 | 2.7 | 4.1 | 1.0 | versicolor |
| 68 | 6.2 | 2.2 | 4.5 | 1.5 | versicolor |
| 69 | 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
| 70 | 5.9 | 3.2 | 4.8 | 1.8 | versicolor |
| 71 | 6.1 | 2.8 | 4.0 | 1.3 | versicolor |
| 72 | 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
| 73 | 6.1 | 2.8 | 4.7 | 1.2 | versicolor |
| 74 | 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
| 75 | 6.6 | 3.0 | 4.4 | 1.4 | versicolor |
| 76 | 6.8 | 2.8 | 4.8 | 1.4 | versicolor |
| 77 | 6.7 | 3.0 | 5.0 | 1.7 | versicolor |
| 78 | 6.0 | 2.9 | 4.5 | 1.5 | versicolor |
| 79 | 5.7 | 2.6 | 3.5 | 1.0 | versicolor |
| 80 | 5.5 | 2.4 | 3.8 | 1.1 | versicolor |
| 81 | 5.5 | 2.4 | 3.7 | 1.0 | versicolor |
| 82 | 5.8 | 2.7 | 3.9 | 1.2 | versicolor |
| 83 | 6.0 | 2.7 | 5.1 | 1.6 | versicolor |
| 84 | 5.4 | 3.0 | 4.5 | 1.5 | versicolor |
| 85 | 6.0 | 3.4 | 4.5 | 1.6 | versicolor |
| 86 | 6.7 | 3.1 | 4.7 | 1.5 | versicolor |
| 87 | 6.3 | 2.3 | 4.4 | 1.3 | versicolor |
| 88 | 5.6 | 3.0 | 4.1 | 1.3 | versicolor |
| 89 | 5.5 | 2.5 | 4.0 | 1.3 | versicolor |
| 90 | 5.5 | 2.6 | 4.4 | 1.2 | versicolor |
| 91 | 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
| 92 | 5.8 | 2.6 | 4.0 | 1.2 | versicolor |
| 93 | 5.0 | 2.3 | 3.3 | 1.0 | versicolor |
| 94 | 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
| 95 | 5.7 | 3.0 | 4.2 | 1.2 | versicolor |
| 96 | 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
| 97 | 6.2 | 2.9 | 4.3 | 1.3 | versicolor |
| 98 | 5.1 | 2.5 | 3.0 | 1.1 | versicolor |
| 99 | 5.7 | 2.8 | 4.1 | 1.3 | versicolor |
virginica_df
| sepal_length | sepal_width | petal_length | petal_width | species | |
|---|---|---|---|---|---|
| 100 | 6.3 | 3.3 | 6.0 | 2.5 | virginica |
| 101 | 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 102 | 7.1 | 3.0 | 5.9 | 2.1 | virginica |
| 103 | 6.3 | 2.9 | 5.6 | 1.8 | virginica |
| 104 | 6.5 | 3.0 | 5.8 | 2.2 | virginica |
| 105 | 7.6 | 3.0 | 6.6 | 2.1 | virginica |
| 106 | 4.9 | 2.5 | 4.5 | 1.7 | virginica |
| 107 | 7.3 | 2.9 | 6.3 | 1.8 | virginica |
| 108 | 6.7 | 2.5 | 5.8 | 1.8 | virginica |
| 109 | 7.2 | 3.6 | 6.1 | 2.5 | virginica |
| 110 | 6.5 | 3.2 | 5.1 | 2.0 | virginica |
| 111 | 6.4 | 2.7 | 5.3 | 1.9 | virginica |
| 112 | 6.8 | 3.0 | 5.5 | 2.1 | virginica |
| 113 | 5.7 | 2.5 | 5.0 | 2.0 | virginica |
| 114 | 5.8 | 2.8 | 5.1 | 2.4 | virginica |
| 115 | 6.4 | 3.2 | 5.3 | 2.3 | virginica |
| 116 | 6.5 | 3.0 | 5.5 | 1.8 | virginica |
| 117 | 7.7 | 3.8 | 6.7 | 2.2 | virginica |
| 118 | 7.7 | 2.6 | 6.9 | 2.3 | virginica |
| 119 | 6.0 | 2.2 | 5.0 | 1.5 | virginica |
| 120 | 6.9 | 3.2 | 5.7 | 2.3 | virginica |
| 121 | 5.6 | 2.8 | 4.9 | 2.0 | virginica |
| 122 | 7.7 | 2.8 | 6.7 | 2.0 | virginica |
| 123 | 6.3 | 2.7 | 4.9 | 1.8 | virginica |
| 124 | 6.7 | 3.3 | 5.7 | 2.1 | virginica |
| 125 | 7.2 | 3.2 | 6.0 | 1.8 | virginica |
| 126 | 6.2 | 2.8 | 4.8 | 1.8 | virginica |
| 127 | 6.1 | 3.0 | 4.9 | 1.8 | virginica |
| 128 | 6.4 | 2.8 | 5.6 | 2.1 | virginica |
| 129 | 7.2 | 3.0 | 5.8 | 1.6 | virginica |
| 130 | 7.4 | 2.8 | 6.1 | 1.9 | virginica |
| 131 | 7.9 | 3.8 | 6.4 | 2.0 | virginica |
| 132 | 6.4 | 2.8 | 5.6 | 2.2 | virginica |
| 133 | 6.3 | 2.8 | 5.1 | 1.5 | virginica |
| 134 | 6.1 | 2.6 | 5.6 | 1.4 | virginica |
| 135 | 7.7 | 3.0 | 6.1 | 2.3 | virginica |
| 136 | 6.3 | 3.4 | 5.6 | 2.4 | virginica |
| 137 | 6.4 | 3.1 | 5.5 | 1.8 | virginica |
| 138 | 6.0 | 3.0 | 4.8 | 1.8 | virginica |
| 139 | 6.9 | 3.1 | 5.4 | 2.1 | virginica |
| 140 | 6.7 | 3.1 | 5.6 | 2.4 | virginica |
| 141 | 6.9 | 3.1 | 5.1 | 2.3 | virginica |
| 142 | 5.8 | 2.7 | 5.1 | 1.9 | virginica |
| 143 | 6.8 | 3.2 | 5.9 | 2.3 | virginica |
| 144 | 6.7 | 3.3 | 5.7 | 2.5 | virginica |
| 145 | 6.7 | 3.0 | 5.2 | 2.3 | virginica |
| 146 | 6.3 | 2.5 | 5.0 | 1.9 | virginica |
| 147 | 6.5 | 3.0 | 5.2 | 2.0 | virginica |
| 148 | 6.2 | 3.4 | 5.4 | 2.3 | virginica |
| 149 | 5.9 | 3.0 | 5.1 | 1.8 | virginica |
plt.hist(setosa_df.sepal_width, alpha=0.4, bins=dat_bin_arange);
plt.hist(versicolor_df.sepal_width, alpha=0.4, bins=dat_bin_arange);
plt.legend(['Setosa', 'Versicolor']);
plt.hist(setosa_df.sepal_width, alpha=0.4, bins=dat_bin_arange);
plt.hist(virginica_df.sepal_width, alpha=0.4, bins=dat_bin_arange);
plt.legend(['Setosa', 'Virginica']);
plt.hist(versicolor_df.sepal_width, alpha=0.4, bins=dat_bin_arange);
plt.hist(virginica_df.sepal_width, alpha=0.4, bins=dat_bin_arange);
plt.legend(['Versicolor', 'Virginica']);
We can also stack multiple histograms on top of one another.
plt.title('Distribution of Sepal Width')
plt.hist([setosa_df.sepal_width, versicolor_df.sepal_width, virginica_df.sepal_width],
bins=dat_bin_arange,
stacked=True);
plt.legend(['Setosa', 'Versicolor', 'Virginica']);
Bar charts are quite similar to line charts, i.e., they show a sequence of values. However, a bar is shown for each value, rather than points connected by lines. We can use the plt.bar function to draw a bar chart.
years= range(2000, 2006)
apples= [0.35, 0.6, 0.9, 0.8, 0.65, 0.8]
oranges= [0.4, 0.8, 0.9, 0.7, 0.6, 0.8]
plt.plot(years, oranges);
plt.bar(years, oranges);
plt.bar(years, apples);
plt.bar(years, oranges);
plt.plot(years, oranges, 'o--r');
plt.title('Yield of Oranges');
plt.bar(years, apples);
plt.plot(years, apples, 'o--r');
plt.title('Yield of Apples');
Like histograms, we can stack bars on top of one another. We use the bottom argument of plt.bar to achieve this.
plt.bar(years, apples);
plt.bar(years, oranges, bottom=apples);
plt.bar(years, oranges);
plt.bar(years, apples, bottom=oranges);
plt.bar(years, apples);
plt.bar(years, oranges, bottom=apples);
plt.legend(['Apples', 'Oranges']);
plt.bar(years, oranges);
plt.bar(years, apples, bottom=oranges);
plt.legend(['Oranges', 'Apples']);
Let's look at another sample dataset included with Seaborn, called tips. The dataset contains information about the sex, time of day, total bill, and tip amount for customers visiting a restaurant over a week.
tips_df = sns.load_dataset('tips');
tips_df
| total_bill | tip | sex | smoker | day | time | size | |
|---|---|---|---|---|---|---|---|
| 0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
| 1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
| 2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
| 3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
| 4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 239 | 29.03 | 5.92 | Male | No | Sat | Dinner | 3 |
| 240 | 27.18 | 2.00 | Female | Yes | Sat | Dinner | 2 |
| 241 | 22.67 | 2.00 | Male | Yes | Sat | Dinner | 2 |
| 242 | 17.82 | 1.75 | Male | No | Sat | Dinner | 2 |
| 243 | 18.78 | 3.00 | Female | No | Thur | Dinner | 2 |
244 rows × 7 columns
bill_sex_avg_df = tips_df.groupby('sex')[['total_bill']].mean()
bill_sex_avg_df
| total_bill | |
|---|---|
| sex | |
| Male | 20.744076 |
| Female | 18.056897 |
plt.bar(bill_sex_avg_df.index, bill_sex_avg_df.total_bill);
bill_avg_df = tips_df.groupby('day')[['total_bill']].mean()
bill_avg_df
| total_bill | |
|---|---|
| day | |
| Thur | 17.682742 |
| Fri | 17.151579 |
| Sat | 20.441379 |
| Sun | 21.410000 |
plt.bar(bill_avg_df.index, bill_avg_df.total_bill);
We might want to draw a bar chart to visualize how the average bill amount varies across different days of the week. One way to do this would be to compute the day-wise averages and then use plt.bar (try it as an exercise).
However, since this is a very common use case, the Seaborn library provides a barplot function which can automatically compute averages.
sns.barplot(x='day', y='total_bill', data=tips_df);
sns.barplot(x='sex', y='total_bill', data=tips_df);
The lines cutting each bar represent the amount of variation in the values. For instance, it seems like the variation in the total bill is relatively high on Fridays and low on Saturday.
We can also specify a hue argument to compare bar plots side-by-side based on a third feature, e.g., sex.
sns.barplot(x='day', y='total_bill', hue='sex', data=tips_df);
sns.barplot(x='day', y='total_bill', hue='smoker', data=tips_df);
You can make the bars horizontal simply by switching the axes
sns.barplot(x='total_bill', y='day', hue='sex', data=tips_df);
sns.barplot(x='total_bill', y='day', hue='smoker', data=tips_df);
A heatmap is used to visualize 2-dimensional data like a matrix or a table using colors. The best way to understand it is by looking at an example. We'll use another sample dataset from Seaborn, called flights, to visualize monthly passenger footfall at an airport over 12 years.
df = sns.load_dataset('flights');
df
| year | month | passengers | |
|---|---|---|---|
| 0 | 1949 | Jan | 112 |
| 1 | 1949 | Feb | 118 |
| 2 | 1949 | Mar | 132 |
| 3 | 1949 | Apr | 129 |
| 4 | 1949 | May | 121 |
| ... | ... | ... | ... |
| 139 | 1960 | Aug | 606 |
| 140 | 1960 | Sep | 508 |
| 141 | 1960 | Oct | 461 |
| 142 | 1960 | Nov | 390 |
| 143 | 1960 | Dec | 432 |
144 rows × 3 columns
plt.plot(df.passengers);
flights_df= sns.load_dataset('flights').pivot('month', 'year', 'passengers');
flights_df
| year | 1949 | 1950 | 1951 | 1952 | 1953 | 1954 | 1955 | 1956 | 1957 | 1958 | 1959 | 1960 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| month | ||||||||||||
| Jan | 112 | 115 | 145 | 171 | 196 | 204 | 242 | 284 | 315 | 340 | 360 | 417 |
| Feb | 118 | 126 | 150 | 180 | 196 | 188 | 233 | 277 | 301 | 318 | 342 | 391 |
| Mar | 132 | 141 | 178 | 193 | 236 | 235 | 267 | 317 | 356 | 362 | 406 | 419 |
| Apr | 129 | 135 | 163 | 181 | 235 | 227 | 269 | 313 | 348 | 348 | 396 | 461 |
| May | 121 | 125 | 172 | 183 | 229 | 234 | 270 | 318 | 355 | 363 | 420 | 472 |
| Jun | 135 | 149 | 178 | 218 | 243 | 264 | 315 | 374 | 422 | 435 | 472 | 535 |
| Jul | 148 | 170 | 199 | 230 | 264 | 302 | 364 | 413 | 465 | 491 | 548 | 622 |
| Aug | 148 | 170 | 199 | 242 | 272 | 293 | 347 | 405 | 467 | 505 | 559 | 606 |
| Sep | 136 | 158 | 184 | 209 | 237 | 259 | 312 | 355 | 404 | 404 | 463 | 508 |
| Oct | 119 | 133 | 162 | 191 | 211 | 229 | 274 | 306 | 347 | 359 | 407 | 461 |
| Nov | 104 | 114 | 146 | 172 | 180 | 203 | 237 | 271 | 305 | 310 | 362 | 390 |
| Dec | 118 | 140 | 166 | 194 | 201 | 229 | 278 | 306 | 336 | 337 | 405 | 432 |
flights_df is a matrix with one row for each month and one column for each year. The values show the number of passengers (in thousands) that visited the airport in a specific month of a year. We can use the sns.heatmap function to visualize the footfall at the airport.
plt.title('No. of Passengers (1000s)')
sns.heatmap(flights_df);
plt.title('No. of Passengers (100s)')
sns.heatmap(flights_df);
The brighter colors indicate a higher footfall at the airport. By looking at the graph, we can infer two things:
We can also display the actual values in each block by specifying annot=True and using the cmap argument to change the color palette.
plt.title('No. of Paseengers (1000s)')
sns.heatmap(flights_df, fmt='d', annot=True, cmap='Blues');
We can also use Matplotlib to display images. Let's download an image from the internet.
from urllib.request import urlretrieve
urlretrieve('https://i.imgur.com/SkPbq.jpg', 'chart.jpg');
Before displaying an image, it has to be read into memory using the PIL module.
from PIL import Image
img= Image.open('chart.jpg')
type(img)
PIL.JpegImagePlugin.JpegImageFile
An image loaded using PIL is simply a 3-dimensional numpy array containing pixel intensities for the red, green & blue (RGB) channels of the image. We can convert the image into an array using np.array.
img_array= np.array(img)
img_array.shape
(481, 640, 3)
img_array
array([[[10, 10, 10],
[ 0, 0, 0],
[ 0, 0, 0],
...,
[ 0, 0, 0],
[ 4, 4, 4],
[ 0, 0, 0]],
[[ 0, 0, 0],
[ 3, 3, 3],
[10, 10, 10],
...,
[ 2, 2, 2],
[ 6, 6, 6],
[ 0, 0, 0]],
[[ 1, 1, 1],
[ 4, 4, 4],
[11, 11, 11],
...,
[10, 10, 10],
[ 2, 2, 2],
[ 0, 0, 0]],
...,
[[ 6, 6, 6],
[ 0, 0, 0],
[ 1, 1, 1],
...,
[ 0, 0, 0],
[ 8, 8, 8],
[ 5, 5, 5]],
[[ 2, 2, 2],
[ 1, 1, 1],
[ 0, 0, 0],
...,
[ 0, 0, 0],
[10, 10, 10],
[ 0, 0, 0]],
[[ 0, 0, 0],
[ 0, 0, 0],
[ 0, 0, 0],
...,
[ 0, 0, 0],
[ 0, 0, 0],
[ 0, 0, 0]]], dtype=uint8)
We can display the PIL image using plt.imshow.
plt.imshow(img);
We can turn off the axes & grid lines and show a title using the relevant functions.
plt.grid(False)
plt.title('Data science meme')
plt.axis('off')
plt.imshow(img);
To display a part of the image, we can simply select a slice from the numpy array.
plt.grid(False)
plt.axis('off')
plt.imshow(img_array[125:325, 105:305]);
Matplotlib and Seaborn also support plotting multiple charts in a grid, using plt.subplots, which returns a set of axes for plotting.
Here's a single grid showing the different types of charts we've covered in this tutorial.
plt.subplots(2,3)
(<Figure size 900x500 with 6 Axes>,
array([[<Axes: >, <Axes: >, <Axes: >],
[<Axes: >, <Axes: >, <Axes: >]], dtype=object))
fig, axes= plt.subplots(2,3)
plt.tight_layout(pad=2)
axes
array([[<Axes: >, <Axes: >, <Axes: >],
[<Axes: >, <Axes: >, <Axes: >]], dtype=object)
axes.shape
(2, 3)
axes[0,0]
<Axes: >
fig, axes=plt.subplots(2,3)
axes[0,0].plot(years, oranges, 'o--r')
plt.tight_layout(pad=2)
fig, axes=plt.subplots(2,3)
axes[0,0].plot(years, oranges, 'o--b')
plt.tight_layout(pad=2)
fig, axes=plt.subplots(2,3)
axes[0,1].plot(years, oranges, 'o--r')
plt.tight_layout(pad=2)
fig, axes=plt.subplots(2,3)
axes[0,0].plot(years, apples, 'o--r')
axes[0,0].set_title('Yield of Apples')
plt.tight_layout(pad=2)
fig, axes=plt.subplots(2,3, figsize=(12,9))
axes[0,0].plot(years, apples, 's--b')
axes[0,0].plot(years, oranges, 'o--r')
axes[0,0].set_xlabel('Year')
axes[0,0].set_ylabel('Yield (tons per hectar)')
axes[0,0].set_title('Yield of Oranges')
axes[0,0].legend(['Apples', 'Oranges'])
axes[0,1].set_title('Sepal Lenght vs. Sepal Width ')
sns.scatterplot(x=flowers_df.sepal_length,
y=flowers_df.sepal_width,
hue=flowers_df.species,
s=100,
ax=axes[0,1])
plt.tight_layout(pad=2)
# Use the axes for plotting
fig, axes=plt.subplots(2,3, figsize=(12,9))
axes[0,0].plot(years, apples, 's--b')
axes[0,0].plot(years, oranges, 'o--r')
axes[0,0].set_xlabel('Year')
axes[0,0].set_ylabel('Yield (tons per hectar)')
axes[0,0].legend(['Apples', 'Oranges'])
axes[0,0].legend('Crop Yields in Kanto')
# Pass the axes into seaborn
axes[0,1].set_title('Sepal Lenght vs. Sepal Width ')
sns.scatterplot(x=flowers_df.sepal_length,
y=flowers_df.sepal_width,
hue=flowers_df.species,
s=100,
ax=axes[0,1]);
# Use the axes for plotting
axes[0,2].set_title('Distribution of Sepal Width')
axes[0,2].hist([setosa_df.sepal_width, versicolor_df.sepal_width, virginica_df.sepal_width],
bins=dat_bin_arange,
stacked=True);
axes[0,2].legend(['Setosa', 'Versicolor', 'Virginica']);
# Pass the axes into seaborn
axes[1,0].set_title('Restaurant bills')
sns.barplot(x='day', y='total_bill', hue='sex', data=tips_df, ax=axes[1,0]);
# Pass the axes into seaborn
axes[1,1].set_title('Flight traffic')
sns.heatmap(flights_df, cmap='Blues', ax=axes[1,1]);
# Plot an image using the axes
axes[1,2].set_title('Data Science Meme')
axes[1,2].imshow(img)
axes[1,2].grid(False)
axes[1,2].set_xticks([])
axes[1,2].set_yticks([])
plt.tight_layout(pad=2);
Seaborn also provides a helper function sns.pairplot to automatically plot several different charts for pairs of features within a dataframe.
sns.pairplot(flowers_df, hue='species');
sns.pairplot(tips_df, hue='sex');